Preserving biological heterogeneity with a permuted surrogate variable analysis for genomics batch correction

نویسندگان

  • Hilary S. Parker
  • Jeffrey T. Leek
  • Alexander V. Favorov
  • Michael Considine
  • Xiaoxin Xia
  • Sameer Chavan
  • Christine H. Chung
  • Elana J. Fertig
چکیده

MOTIVATION Sample source, procurement process and other technical variations introduce batch effects into genomics data. Algorithms to remove these artifacts enhance differences between known biological covariates, but also carry potential concern of removing intragroup biological heterogeneity and thus any personalized genomic signatures. As a result, accurate identification of novel subtypes from batch-corrected genomics data is challenging using standard algorithms designed to remove batch effects for class comparison analyses. Nor can batch effects be corrected reliably in future applications of genomics-based clinical tests, in which the biological groups are by definition unknown a priori. RESULTS Therefore, we assess the extent to which various batch correction algorithms remove true biological heterogeneity. We also introduce an algorithm, permuted-SVA (pSVA), using a new statistical model that is blind to biological covariates to correct for technical artifacts while retaining biological heterogeneity in genomic data. This algorithm facilitated accurate subtype identification in head and neck cancer from gene expression data in both formalin-fixed and frozen samples. When applied to predict Human Papillomavirus (HPV) status, pSVA improved cross-study validation even if the sample batches were highly confounded with HPV status in the training set. AVAILABILITY AND IMPLEMENTATION All analyses were performed using R version 2.15.0. The code and data used to generate the results of this manuscript is available from https://sourceforge.net/projects/psva.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Removing batch effects for prediction problems with frozen surrogate variable analysis

Batch effects are responsible for the failure of promising genomic prognostic signatures, major ambiguities in published genomic results, and retractions of widely-publicized findings. Batch effect corrections have been developed to remove these artifacts, but they are designed to be used in population studies. But genomic technologies are beginning to be used in clinical applications where sam...

متن کامل

Bayesian Gaussian Process Latent Variable Models for pseudotime inference in single-cell RNA-seq data

Single-cell genomics has revolutionised modern biology while requiring the development of advanced computational and statistical methods. Advances have been made in uncovering gene expression heterogeneity, discovering new cell types and novel identification of genes and transcription factors involved in cellular processes. One such approach to the analysis is to construct pseudotime orderings ...

متن کامل

Dosimetric Study of Tissue Heterogeneity Correction for Breast Conformal Radiotherapy

Introduction: Heterogeneity correction is an important parameter in dose calculation for cancer patients where it may be cause inaccuracy in dose calculation as a result of different densities of patients. This study studied the impact of dose calculation of breast cancer patients with and without heterogeneity correction. Material and Methods: Twenty breast cancer patients were treated with Th...

متن کامل

iSARST: an integrated SARST web server for rapid protein structural similarity searches

iSARST is a web server for efficient protein structural similarity searches. It is a multi-processor, batch-processing and integrated implementation of several structural comparison tools and two database searching methods: SARST for common structural homologs and CPSARST for homologs with circular permutations. iSARST allows users submitting multiple PDB/SCOP entry IDs or an archive file conta...

متن کامل

A Power Series Solution for Free Vibration of Variable Thickness Mindlin Circular Plates with Two-Directional Material Heterogeneity and Elastic Foundations

In the present paper, a semi-analytical solution is presented for free vibration analysis of circular plates with complex combinations of the geometric parameters, edge-conditions, material heterogeneity, and elastic foundation coefficients. The presented solution covers many engineering applications. The plate is assumed to have a variable thickness and made of a heterogeneous material whose p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 30 19  شماره 

صفحات  -

تاریخ انتشار 2014